On the front of the OmniPage Pro 9.0 box, we promise "over 99% accuracy" and add the restriction, "using laser-quality documents with standard fonts."
You may wonder exactly what we mean by "over 99% accuracy."
We mean that a user scanning clean, sharp original documents printed with normal fonts such as Times Roman or Arial can expect on average to see errors in fewer than one word out of every 100. "On average" means that on a given page of 400 words, a user might experience no errors at all, or more than four. But given enough pages, the average will be less than four.
An immediate question might be: how do we know this? The answer is that we scanned and recognized 10,000 pages using OmniPage Pro in order to come up with a meaningful average. These weren't 10,000 copies of the same page, but rather an array of different pages designed to reflect real-world documents. And we used special software tools designed to compare the test documents and the recognized documents word for word.
In case you're wondering, a trained data entry operator commits an error, on average, once every 300 characters. Translated into word accuracy, this means that on a 400-word page the typist will commit on average eight errors, or more than twice as many as OmniPage Pro. As for typing speed, there's no contest. The data entry operator types at a rate of 30-40 words per minute, while OmniPage Pro converts words from paper to computer-editable format at the rate of thousands of words per minute. It's also faster at finding errors and letting the user correct them.
By the way, when we talk about 99+ percent accuracy, we're not talking about the format retention features of OmniPage Pro. Format retention, something human typists cannot do at all, is a feature that lets you save as much or as little of the original document's layout as you want, using easily modified settings. This means information like multiple columns, headline size and location, and graphics can be preserved throughout the OCR process. Format retention, while highly reliable, is a separate category from text accuracy.
You'll notice that we restrict our text accuracy claim to laser-quality original documents and standard fonts. When performing OCR, you should always use the best quality document possible. Text degradation, which we can't control, can adversely affect optical character recognition. Faxes, photocopies, documents printed with an inkjet or dot matrix printer, even documents from a laser printer whose toner is low, are more difficult for any OCR program to recognize. Similarly, every OCR program will have trouble with type fonts that deviate widely from standard fonts, simply because the characters are too different. Even humans, who are far better than computers at recognizing characters, sometimes have trouble reading novelty fonts.
We hope we've answered your questions about OmniPage Pro's superior accuracy. If you would like further information about OmniPage Pro or other Caere products such as OmniForm, OmniWeb or PageKeeper, more information is available on this CD on the Main Page